KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks

نویسندگان

  • Stephan Günnemann
  • Hardy Kremer
  • Matthias Hannen
  • Thomas Seidl
چکیده

Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, termed KDD-SC. In contrast to existing subspace clustering toolkits, our solution neither is a standalone product nor is it tightly coupled to a specific KDD framework. Our extension is realized by a common codebase and easy-to-use plugins for three of the most popular KDD frameworks, namely KNIME, RapidMiner, and WEKA. KDD-SC extends these frameworks such that they offer a wide range of different subspace clustering functionalities. It provides a multitude of algorithms, data generators, evaluation measures, and visualization techniques specifically designed for subspace clustering. These functionalities integrate seamlessly with the frameworks’ existing features such that they can be flexibly combined. KDD-SC is publicly available on our website.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subspace outlier mining in large multimedia databases

Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...

متن کامل

A New Visualisation Technique for Knowledge Discovery in OLAP (Abstract)

Introduction Knowledge discovery in databases (KDD) has as its primary objective the uncovering of new and useful knowledge from huge masses of raw data. Our research in KDD has focused on how visualisation can be exploited in the dialectic process of knowledge discovery [6]. Our approach uses a multi-dimensional data visualisation (MDV) technique that builds upon a refined and improved method ...

متن کامل

Knowledge = Concepts: A Harmful Equation

Research on knowledge discovery in databases (KDD) has been impeded by a limited vision of knowledge, inherited from machine learning (ML) and other branches of computer science. In contrast with KDD and ML, research on automation of scientific discovery (SD) took from natural sciences a broader perspective on knowledge. We analyze the typical ML view of discovery as supervised and unsupervised...

متن کامل

Data Mining and Knowledge Discovery: An Analytical Investigation

In recent years, the exponentially growing amount of data made traditional data analysis methods impractical. Knowledge discovery in databases (KDD) provides a framework for alternative methods that address this problem. In this research we follow the KDD process, develop a mathematical model of transforming data and information into knowledge and create a clustering data mining algorithm. To t...

متن کامل

A Database Interface for Clustering in Large Spatial Databases

Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1407.3850  شماره 

صفحات  -

تاریخ انتشار 2014